Graph Attention for Automated Audio Captioning
نویسندگان
چکیده
State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained neural networks (PANNs) as encoders for feature extraction. However, convolution operation used in PANNs is limited capturing long-time dependencies within an signal, thereby leading to potential performance degradation captioning. This letter presents a novel method using graph attention (GraphAC) based In encoder, module introduced after learn contextual association (i.e. dependency among features over different time frames) through adjacency graph, and top- k mask mitigate interference from noisy nodes. The learnt leads more effective representation node aggregation. As result, decoder can predict important semantic information about acoustic scene events on associations learned signal. Experimental results show that GraphAC outperforms state-of-the-art encoders, thanks incorporation of into encoder source code available at https://github.com/LittleFlyingSheep/GraphAC.
منابع مشابه
Image Captioning with Attention
In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...
متن کاملImage Captioning using Visual Attention
This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...
متن کاملText-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملAutomated closed captioning for Russian live broadcasting
The paper describes a hardware-software system for real-time closed captioning of Russian live TV broadcasts. The use of respeaking technology enabled us to create an ASR system with WER not exceeding 5.5%. Editing closed captions in real time further reduces WER down to 0.2%. In the paper we report some advancements in LMs for a highly inflected language and also in using morphological rescori...
متن کاملVideo Captioning with Multi-Faceted Attention
Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval. While existing methods rely on different kinds of visual features and model structures, they do not fully exploit relevant semantic information. We present an extensible approach to jointly leverage several sorts of visual features and sema...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Signal Processing Letters
سال: 2023
ISSN: ['1558-2361', '1070-9908']
DOI: https://doi.org/10.1109/lsp.2023.3266114